Keyword [DQN]
Bellver M, Giroinieto X, Marques F, et al. Hierarchical Object Detection with Deep Reinforcement Learning[J]. Advances in Parallel Computing, 2016
1. Overview
In this paper, it trained an intelligent agent to decide where to look
- proposal strategy. with and w/o overlap
- extract feature strategy. zoom and crop-pool
- overlap+zoom better
- cast the problem as a Markov Decision Process (MDP)
1.1. Related Work
- RL has been applied to Classification, Captioning, Activity Recognition
- Region proposal is expensive (R-CNN)
- AttentionNet. cast detection as iterative classification
- SSD
- Active Object Localization (agent).
2. Methods
2.1. MDP Formulation
- state. descriptor of current region + memory vector (last 4 actions, 6 * 4=24 dimension)
- action. 5 movement + terminal
- reward.
- τ=0.5
- η=3
2.2. Q-learning
2.3. Proposal Strategy
- overlap. 0.75
2.4. Model
2.5. Exploration-Exploration
- ε-greedy policy. start with ε=1, decrease until ε=0.1 in step of 0.1
- start with random actions, and at each epoch the agent takes decisions relying more on the already learnt policy
- to help the agent learn terminal action, we force it each time the current region has a IoU>0.5
2.6. Training Parameters
- DQN
- Adam 1e-6
- 50 epoch
- discount factor. γ=0.9
2.7. Experiments Replay
(s, a, r, s’)
- consecutive experiences in this paper is very correlated, lead to inefficient and unstable learning
- to solve it, we use random minibatches. 1000 experiences and batch size 100
3. Experiments
3.1. Region Proposal
- with less than 3 step, can almost approximate all objects we can detect
3.2. Zoom vs Crop-Pool
- Zoom better